28 research outputs found

    Multi-view 3D Reconstruction of a Scene Containing Independently Moving Objects

    Get PDF
    In this thesis, the structure from motion problem for calibrated scenes containing independently moving objects (IMO) has been studied. For this purpose, the overall reconstruction process is partitioned into various stages. The first stage deals with the fundamental problem of estimating structure and motion by using only two views. This process starts with finding some salient features using a sub-pixel version of the Harris corner detector. The features are matched by the help of a similarity and neighborhood-based matcher. In order to reject the outliers and estimate the fundamental matrix of the two images, a robust estimation is performed via RANSAC and normalized 8-point algorithms. Two-view reconstruction is finalized by decomposing the fundamental matrix and estimating the 3D-point locations as a result of triangulation. The second stage of the reconstruction is the generalization of the two-view algorithm for the N-view case. This goal is accomplished by first reconstructing an initial framework from the first stage and then relating the additional views by finding correspondences between the new view and already reconstructed views. In this way, 3D-2D projection pairs are determined and the projection matrix of this new view is estimated by using a robust procedure. The final section deals with scenes containing IMOs. In order to reject the correspondences due to moving objects, parallax-based rigidity constraint is used. In utilizing this constraint, an automatic background pixel selection algorithm is developed and an IMO rejection algorithm is also proposed. The results of the proposed algorithm are compared against that of a robust outlier rejection algorithm and found to be quite promising in terms of execution time vs. reconstruction quality

    Fast outlier rejection by using parallax-based rigidity constraint for epipolar geometry estimation

    Get PDF
    A novel approach is presented in order to reject correspondence outliers between frames using the parallax-based rigidity constraint for epipolar geometry estimation. In this approach, the invariance of 3-D relative projective structure of a stationary scene over different views is exploited to eliminate outliers, mostly due to independently moving objects of a typical scene. The proposed approach is compared against a well-known RANSAC-based algorithm by the help of a test-bed. The results showed that the speed-up, gained by utilization of the proposed technique as a preprocessing step before RANSAC-based approach, decreases the execution time of the overall outlier rejection, significantly

    Efficient Large Scale Multi-View Stereo for Ultra High Resolution Image Sets

    Get PDF
    We present a new approach for large scale multi-view stereo matching, which is designed to operate on ultra high resolution image sets and efficiently compute dense 3D point clouds. We show that, by using a robust descriptor for matching purposes and high resolution images, we can skip the computationally expensive steps other algorithms require. As a result, our method has low memory requirements and low computational complexity while producing 3D point clouds containing virtually no outliers. This makes it exceedingly suitable for large scale reconstruction. The core of our algorithm is the dense matching of image pairs using DAISY descriptors, implemented so as to eliminate redundancies and optimize memory access. We use a variety of challenging data sets to validate and compare our results against other algorithms

    Virtual View Generation with a Hybrid Camera Array

    Get PDF
    Virtual view synthesis from an array of cameras has been an essential element of three-dimensional video broadcasting/conferencing. In this paper, we propose a scheme based on a hybrid camera array consisting of four regular video cameras and one time-of-flight depth camera. During rendering, we use the depth image from the depth camera as initialization, and compute a view-dependent scene geometry using constrained plane sweeping from the regular cameras. View-dependent texture mapping is then deployed to render the scene at the desired virtual viewpoint. Experimental results show that the addition of the time-of-flight depth camera greatly improves the rendering quality compared with an array of regular cameras with similar sparsity. In the application of 3D video boardcasting/conferencing, our hybrid camera system demonstrates great potential in reducing the amount of data for compression/streaming while maintaining high rendering quality

    Template-free monocular reconstruction of deformable surfaces

    Get PDF
    It has recently been shown that deformable 3D surfaces could be recovered from single video streams. However, existing techniques either require a reference view in which the shape of the surface is known a priori, which often may not be available, or require tracking points over long sequences, which is hard to do. In this paper, we overcome these limitations. To this end, we establish correspondences between pairs of frames in which the shape is different and unknown. We then estimate homographies between corresponding local planar patches in both images. These yield approximate 3D reconstructions of points within each patch up to a scale factor. Since we consider overlapping patches, we can enforce them to be consistent over the whole surface. Finally, a local deformation model is used to fit a triangulated mesh to the 3D point cloud, which makes the reconstruction robust to both noise and outliers in the image data

    Structure from motion in dynamic scenes with multiple motions

    Get PDF
    In this study, an algorithm is proposed to solve the multi-body structure from motion (SfM) problem for the single camera case. The algorithm uses the epipolar criterion to segment the features belonging to independently moving objects. Once the features are segmented, corresponding objects are reconstructed individually by applying a sequential algorithm, which uses the previous structure to estimate the pose of the current frame. A tracker is utilized to increase the baseline and improve the F-matrix estimation, which is beneficial for both segmentation and 3D structure estimation. The experimental results on synthetic and real data demonstrate that our approach efficiently deals with the multi-body SfM problem

    Large-scale data for multiple-view stereopsis

    Get PDF
    The seminal multiple-view stereo benchmark evaluations from Middlebury and by Strecha et al. have played a major role in propelling the development of multi-view stereopsis (MVS) methodology. The somewhat small size and variability of these data sets, however, limit their scope and the conclusions that can be derived from them. To facilitate further development within MVS, we here present a new and varied data set consisting of 80 scenes, seen from 49 or 64 accurate camera positions. This is accompanied by accurate structured light scans for reference and evaluation. In addition all images are taken under seven different lighting conditions. As a benchmark and to validate the use of our data set for obtaining reasonable and statistically significant findings about MVS, we have applied the three state-of-the-art MVS algorithms by Campbell et al., Furukawa et al., and Tola et al. to the data set. To do this we have extended the evaluation protocol from the Middlebury evaluation, necessitated by the more complex geometry of some of our scenes. The data set and accompanying evaluation framework are made freely available online. Based on this evaluation, we are able to observe several characteristics of state-of-the-art MVS, e.g. that there is a tradeoff between the quality of the reconstructed 3D points (accuracy) and how much of an object’s surface is captured (completeness). Also, several issues that we hypothesized would challenge MVS, such as specularities and changing lighting conditions did not pose serious problems. Our study finds that the two most pressing issues for MVS are lack of texture and meshing (forming 3D points into closed triangulated surfaces)

    DAISY: A Fast Descriptor for Dense Wide Baseline Stereo and Multiview Reconstruction

    No full text
    Stereo reconstruction is a fundamental problem of computer vision. It has been studied for more than three decades and significant progress has been made in recent years as evidenced by the quality of the models now being produced. This is highly related with the advances in other fields. With the emergence of low cost high-quality cameras, we now live in an era where there is an abundant amount of data for use in reconstruction. The multitude of images with numerous sources of capture arose new interest in the stereo vision community due to new challenges such as being robust to photometric and geometric variability, scalability issues related to number of images and image resolutions. In this thesis, we aim to find efficient, and therefore practical, algorithmic solutions for the two extreme ends of stereo vision problem: first, we consider only two input image case where the cameras are placed far from each other and then we investigate the large scale multi-view reconstruction for ultra-high resolution image sets. Both problems have unique challenges where in the first part we need to handle the large perspective distortions that the image texture undergoes and in the second part we need to design an algorithm that can scale up to ultra-high resolution very large number of image sets using only a single standard computer. For the first problem, we design an efficient dense image descriptor, called DAISY, that is not only robust to photometric transforms like brightness and contrast changes but also robust to perspective effects that view-point changes produce. We use the DAISY descriptor as a photo-consistency measure in an expectation maximization framework with a global graph-cuts optimization algorithm to estimate depth and occlusion maps. We demonstrate very successful results on a variety of data sets some of which have laser scanned ground truths. After the estimation of depth and occlusion maps, we introduce a technique to improve the surface reconstruction in occluded areas by extracting normal cues using simple binary classifiers trained over DAISY-like features. For the large scale ultra-high resolution multi-view stereo problem, we design a very efficient local optimization algorithm instead of the global one developed in the first part of the thesis for the depth estimation framework. The scalability over the number of images is handled by representing the scene with a set of depth maps and the scalability over the image resolution is handled by the use of a local approach for depth map estimation. We demonstrate state-of-the-art quality results for very large sets of very high resolution images computed on a single standard computer at comparatively very short computation times. Overall, we show that the use of a distinctive and robust descriptor to measure photo-consistency allows us to avoid many complex stages other algorithms utilize without sacrificing from the accuracy of the results and thus scale up to large data sets easily

    Bağımsız olarak hareket eden nesneler içeren bir sahnenin çoklu resimlerden 3 boyutlu sahne yapısının çıkarılması

    No full text
    In this thesis, the structure from motion problem for calibrated scenes containing independently moving objects (IMO) has been studied. For this purpose, the overall reconstruction process is partitioned into various stages. The first stage deals with the fundamental problem of estimating structure and motion by using only two views. This process starts with finding some salient features using a sub-pixel version of the Harris corner detector. The features are matched by the help of a similarity and neighborhood-based matcher. In order to reject the outliers and estimate the fundamental matrix of the two images, a robust estimation is performed via RANSAC and normalized 8-point algorithms. Two-view reconstruction is finalized by decomposing the fundamental matrix and estimating the 3D-point locations as a result of triangulation. The second stage of the reconstruction is the generalization of the two-view algorithm for the N-view case. This goal is accomplished by first reconstructing an initial framework from the first stage and then relating the additional views by finding correspondences between the new view and already reconstructed views. In this way, 3D-2D projection pairs are determined and the projection matrix of this new view is estimated by using a robust procedure. The final section deals with scenes containing IMOs. In order to reject the correspondences due to moving objects, parallax-based rigidity constraint is used. In utilizing this constraint, an automatic background pixel selection algorithm is developed and an IMO rejection algorithm is also proposed. The results of the proposed algorithm are compared against that of a robust outlier rejection algorithm and found to be quite promising in terms of execution time vs. reconstruction quality.M.S. - Master of Scienc
    corecore